Coordinated Checkpointing using Vector Timestamp in Grid Computing
نویسندگان
چکیده
In grid computing, system recovery is carried out using checkpoints recorded at each nodes. The resource manager must recover system with keeping global consistency to prevent Domino effect. Currently, coordinated checkpointing is widely used in which all processes can be synchronized. Considering overhead due to synchronization, we will present a coordinated checkpoint protocol using vector timestamp to reduce overhead. Our proposed protocol aims to reduce idle time of every process by grasping occurred event numbers. We will also evaluate performance of the proposed protocol. Experiment was carried out for parallel computation of eight nodes. As the result of the experiment, we obtained reduction of overhead time with 55 percentages in average at each processes. Thus, we showed effectiveness of our proposed protocol for scalable grid computing.
منابع مشابه
Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid
Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...
متن کاملAn Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
متن کاملIndependent checkpointing in a heterogeneous grid environment
The EU-funded XtreemOS project implements an open-source grid operating system based on Linux. In order to provide fault tolerance and migration for grid applications, it integrates a distributed grid-checkpointing service called XtreemGCP. This service is designed to support different checkpointing protocols and to address the underlying gridnode checkpointers (e.g. BLCR, LinuxSSI, OpenVZ, etc...
متن کاملSoft-Checkpointing Based Coordinated Checkpointing Protocol for Mobile Distributed Systems
Minimum-process coordinated checkpointing is a suitable approach to introduce fault tolerance in mobile distributed systems transparently. It may require blocking of processes, extra synchronization messages or taking some useless checkpoints. Allprocess checkpointing may lead to exceedingly high checkpointing overhead. To optimize both matrices, the checkpointing overhead and the loss of compu...
متن کاملAdaptive Two-Level Blocking Coordinated Checkpointing for High Performance Cluster Computing Systems
Blocking coordinated checkpointing is a well-known method for achieving fault tolerance in cluster computing systems. In this work, we introduce a new approach for blocking coordinated checkpointing using two-level checkpointing. The first level of checkpointing is local checkpointing, and computing nodes save the checkpoints in local disk. If a transient failure occurs in the computing node, t...
متن کامل